Tutorial

Plugin Basic

Plugin Structure

A plugin is a object contains following properties.

  • name: required as identifier for different plugins.
  • priority: plugin with higher number will be executed earlier, default to 0.
  • priorities: Alternative way for setting priorities with an object.
  • before,after: called before and after http request.
  • start: called when plugin is registered;
  • finish: called when task queue is empty and pending count is 0.
const getPlugin = options => ({
  name: "myPlugin",
  priority: 10,
  priorities: {before:100}, 
  // Final priorities: before = 100; after,finish = 10;
  before: (task, crawler) => {},
  after: (task, crawler) => {},
  start: crawler => {},
  finish: crawler => {},
  onError: (error, task, crawler) => {}
});

Register Plugin

Plugin's start method will be called immediately when it is registered on the crawler.

const x = require("crawlx").default;
x.use(getPlugin())

Enable plugin in order

Sometimes plugins should be enabled in order(start method should be called in order). For example, resume plugin will save & load dupFilter set as well. So resume plugin should be enabled first.

(async () => {
  await x.crawler.use(pluginA);
  await x.crawler.use(pluginB);

  x("http://quotes.toscrape.com/");
})();